Statement#

Given two strings, str1 and str2, find the length of the shortest string that has both the input strings as subsequences.

Note: A string XX is a subsequence of string YY if deleting some number of characters from YY results in the string XX.

Let’s assume that we have two strings as follows:

  • str1 = "yabc"
  • str2 = "xabc"

There can be multiple strings that have str1 and str2 as subsequences, for example, "xyaabcc" and "xyabbc", but the shortest string that has these input strings as subsequences is "xyabc". Please note that the sequence of alphabets in the string can be altered.

Constraints:

  • 11 \leq str1.length, str2.length 500\leq 500
  • str1 and str2 consist of lowercase English letters.

Examples#

No.

str1

str2

Length of Shortest Common Supersequence

1

"bcf"

"bdcf"

4

2

"dynamic"

"programming"

15

Try it yourself#

Implement your solution in the following playground.

Python
usercode > main.py
Input #1
Input #2
Shortest Common Supersequence

Note: If you clicked the “Submit” button and the code timed out, this means that your solution needs to be optimized.

Hint: Use dynamic programming and see the magic.

Solution#

We will first explore the naive recursive solution to this problem and then see how it can be improved using the Longest Common Substring dynamic programming pattern.

Naive approach#

A naive approach is to try all the supersequences, one character at a time.

The two changing parameters in our recursive function are the two indices, i1 and i2 which are incremented based on the following two conditions:

  • If the characters at i1 and i2 are a match, skip one character from both the sequences and make a recursive call for the remaining lengths.
  • If the characters do not match, call the function recursively twice by skipping one character from each string. Return the minimum result of these two calls.

Let’s implement the algorithm as discussed above:

Shortest Common Supersequence using recursion

Note: Please observe that if you include the test case commented out in the driver code, the solution is likely to time out. Try to solve the larger problem using the dynamic programming solutions provided below and see the difference.

Time complexity#

The time complexity of the naive approach is exponential O(2n+m)O(2^{n+m}), where nn and mm are the lengths of the input sequences.

Space complexity#

The space complexity of this naive approach is O(n+m)O(n+m), and it is used to store the recursion stack.

Optimized solution using dynamic programming#

Now, let us improve the recursive solution using top-down and bottom-up approaches.

Top-down solution#

This approach is pretty much identical to the naive approach. The only difference is that we create a lookup table to store the results of our subproblems in a 2-D array. Prior to making a recursive call to the function, we check if the desired call’s result exists in the lookup table. If it does, we return the previously stored value. Otherwise, a recursive call is made to the function.

Created with Fabric.js 3.6.6 0 1 2 0 1 2 3 0, 0

1 of 19

Created with Fabric.js 3.6.6 0 1 2 0 1 2 3 0, 0 1, 1

2 of 19

Created with Fabric.js 3.6.6 0 1 2 0 1 2 3 0, 0 1, 1 1, 2

3 of 19

Created with Fabric.js 3.6.6 0 1 2 0 1 2 3 0, 0 1, 1 1, 2 2, 3

4 of 19

Created with Fabric.js 3.6.6 0 1 2 0 1 2 3 0, 0 1, 1 1, 2 2, 3 3, 4

5 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 3 3, 4 0 1 2 0 1 2 3 1

6 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 3 3, 4 0 1 2 0 1 2 2 3 1

7 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 3 3, 4 0 1 2 0 1 2 2 3 1

8 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 3, 4 0 1 2 0 1 2 2 3 1

9 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 4 0 1 2 0 1 2 2 3 1

10 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 4 2, 3 0 1 2 0 1 2 2 3 1

11 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 4 2, 3 0 1 2 0 1 2 2 3 1

12 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 4 2, 3 3, 2 0 1 2 0 1 2 2 3 1

13 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 4 2, 3 3, 2 0 1 2 0 1 2 2 2 3 1

14 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 4 2, 3 3, 2 0 1 2 0 1 2 2 2 3 1

15 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 1 3, 4 2, 3 3, 2 0 1 2 0 1 2 2 2 3 1

16 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 1 3, 4 2, 3 3, 2 0 1 2 0 1 3 2 2 2 3 1

17 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 1 3, 4 2, 3 3, 2 0 1 2 0 1 3 3 2 2 2 3 1

18 of 19

Created with Fabric.js 3.6.6 0, 0 1, 1 1, 2 2, 1 2, 3 2, 2 3, 1 3, 4 2, 3 3, 2 0 1 2 0 4 1 3 3 2 2 2 3 1

19 of 19

Let’s implement the algorithm as discussed above:

Shortest Common Supersequence using memoization

Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.

Time complexity#

Since the dimensions of the lookup table are the lengths of the two strings, there are only that many problems to solve. This implies that the time complexity of the top-down approach is O(m×n)O(m \times n), where mm and nn are the lengths of the input strings.

Space complexity#

The space complexity of this approach is O(m×n)O(m \times n), which is the space required to store the lookup table.

Bottom-up solution#

In the bottom-up approach, we calculate the smaller values and build larger values from them.

We create a lookup table of size (m+1)×(n+1)(m+1)\times(n+1). We fill the 0th0^{th} row with values from 00 to mm and similarly the 0th0^{th} column with values from 00 to nn. The constants in the 0th0^{th} row tell the length of the shortest common supersequence if str2 is empty. Likewise, the constants in the 0th0^{th} column tell the length of the shortest common supersequence if str1 is empty. Using these pre-filled values in the lookup table we iterate through both strings in a nested for loop, starting i1 and i2 from 11, and check if any of the two conditions are TRUE:

  • If the character str1[i1-1] matched str2[i2-1], the length of the shortest common supersequence would be one plus the length of the shortest common supersequence till i1 - 1 and i2 - 1 indices in the two strings.
  • If the character str1[i1-1] does not match str2[i2-1], we consider two shortest common supersequences - one without str1[i1-1] and one without str2[i2-1]. Our required shortest common supersequence length is the shortest of these two super sequences plus one.

Let’s implement the algorithm as discussed above:

Shortest Common Supersequence using tabulation

Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.

Time complexity#

The time complexity of the code is O(m×n)O(m \times n) as we iterate over both the strings in a nested for loop.

Space complexity#

The space complexity is O((m+1)×(n+1))O((m+1)\times(n+1)) as we create a lookup table of this size to store values.

Can we do better?#

We can further improve the bottom-up approach by using a 2-D lookup table of size 2×n2 \times n instead of m×nm \times n. If we notice in the algorithm above when filling up the cells of a specific row, we always require the values of the previous row; that is, for filling the cell lookup_table[i1][i2], we require the values of lookup_table[i1][i2-1], lookup_table[i1-1][i2], and lookup_table[i1-1][i2-1]. Therefore, there is no point in storing all the previous rows. Instead, we can start by filling the first row of the lookup table ourselves, and for each subsequent row, we compute all the values using the first row and store them in the second row of the lookup table. Then, we replace the first row with the second one so it can be used in the next iteration.

The time complexity will remain the same, O(m×n)O(m \times n) because we still have to do the calculations for the input strings. The space complexity, however, reduces to O(2×n)O(2 \times n) since we are only using a lookup table of two rows.

Longest Common Subsequence

Minimum Number of Deletions and Insertions